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We propose a model for evolution aiming to reproduce statistical features of fossil data, in partic- 
ular the distributions of extinction events, the distribution of species per genus and the distribution 
of lifetimes, all of which are known to be of power law type. The model incorporates both species- 
species interactions and ancestral relationships. The main novelty of this work is to show the 
feasibility of fc-core percolation as a selection mechanism. We give theoretical predictions for the 
observable distributions, confirm their validity by computer simulation and report good agreement 
with the fossil data. A key feature of the proposed model is a co-evolving fitness landscape deter- 
mined by the topology of the underlying species interactions, ecological niches emerge naturally. 
The predicted distributions are independent of the rate of speciation, i.e. whether one adopts an 
gradualist or punctuated view of evolution. 



PACS numbers: 

INTRODUCTION 

A promising line of interdisciplinary research was initi- 
ated when it became apparent that a series of statistical 
facts in various types of fossil data can not be obtained by 
a straight forward extension of existing ways of describ- 
ing species proliferation and extinction over geological 
timescales. The recent quantitative interest in evolution- 
ary models originates from the fact that fossil data from 
different sources [H, 0] shows power law behaviour with 
typical exponents for three observables: (i) the distri- 
bution of sizes of extinction events, (ii) the lifetime of 
species and (iii) the number of species per genus, see e.g. 
[3( for an overview. 

One of the first quantitative models of evolution was 
the NK model proposed by Kauffman Q , where species 
evolve and compete on a rugged fitness landscape. A 
species' fitness and therefore lifetime is given by its 
genome and the randomly associated fitnesses to the 
respective genes. In a similar vein Bak and Sneppen 
[B| refined Kauffman's ideas to a model exhibiting self- 
organized criticality. Here it is assumed that the fitness 
landscape possesses valleys and peaks and over time a 
species will mutate "across" a fitness barrier to an adja- 
cent peak. In contrast to these models, where there is no 
explicit species-species interaction, Sole and Manrubia @ 
constructed a model focusing on interspecies dependen- 
cies. They incorporate a connection matrix containing 
the mutual support between two species. If this support 
drops below a critical value the species will not be able 
to maintain its existence anymore, it will go extinct. In 
contrast to the NK and Bak-Sneppen model which are 
per se critical, the Sole-Manrubia model's criticality is 
parameter dependent. For a review see again 

Recently a more general and abstract framework to 
treat systems subject to evolution was developed out of 



the notion of catalytic sets on networks @, [1| • We will 
use this approach to model species proliferation in the 
evolutionary system. As the main novelty of the present 
work we study the feasibility of fc-core percolation as a 
selection mechanism, fc-core percolation is a systematic, 
iterative procedure where a node in a network is removed 
from a network if it sustains less than a fixed number of fc 
links to other nodes. We show that evolutionary systems 
which grow according to a catalytic set dynamics com- 
bined with a fc-core selection mechanism, reproduce the 
observed power law behaviour in good agreement with 
the fossil data for the three observables: size of extinc- 
tion events, lifetime and number of species per genus. 
The model explicitly describes the origination of species 
and their interactions, the fitness landscape is co-evolving 
with the topology specified by these interactions. 



THE MODEL 

In the following species are represented as nodes in a 
network. We introduce two types of links between these 
nodes, the first type keeping the ancestral relations, the 
other type describing the interactions between species. 
These links are recorded in two separate adjacency ma- 
trices, as described below. 



Growth 

The system is initiated with a small number Nq of 
species. These are assumed to be constantly present i.e. 
they are not subject to the selection mechanism. New 
species (nodes) are introduced as mutations of already 
existing ones. They will prove viable only if they receive 
some "support" from other species. At each time step a 



species may be subject to a mutation which leads to a 
new node. The mutation is favored/suppressed through 
the influence of other already existing species. Wc iden- 
tify the probability for the occurrence of a viable muta- 
tion with the effective growth rate A of the system. In 
the absence of any selection mechanisms or extinctions 
the system diversity grows according to N (t) = Noe xt . 
To take into account ancestral relationships we introduce 
the ancestral table a, a three dimensional tensor with en- 
tries ctijk G {0,1}. Suppose that species i mutates and 
gives rise to a new species k and that species j provides 
support for the survival of k. In this case the ancestral 
adjacency matrix element otijk = 1, otherwise ctijk = 0. 
Each species is associated to a genus. If species i is from 
genus <?j, its mutant k will most likely be assigned to the 
same genus <?;. However, with a small probability p gen 
the mutation will be large enough that k constitutes a 
new genus gk ^= g%- The results will, as discussed later, 
only marginally depend on the actual choice of p gen , we 
worked with a figure of p gen = 0.005. 

On top of this ancestral relationship, a new species will 
also interact with other species in its surrounding. The 
environment of a new species - its ecological context - 
will be strongly determined by the environment of its 
ancestors, i.e. the species the ancestors interact with. 
A given species k (descending from i) will thus be most 
likely to interact with more or less the same species as i. 
k receives a given fraction of interaction- links from i. 

As a consequence of this growth rule with the partic- 
ular copying mechanism, clusters of strongly intercon- 
nected, interacting species naturally emerge. In other 
words, species in a cluster are highly adapted to each 
other and form an environment to which can be referred 
to as an " ecological niche" . 

Interspecies dependencies are encoded in the interac- 
tion matrix I^j. A general choice for the entries in I 
would be to introduce a probability that an entry is non- 
zero, i.e. there is an interaction between species i and j, 
and in this case let the values of I vary between —1 and 
+1, for inhibitive and stimulating influences. Evolution- 
ary dynamics of this kind has been studied [Kj and it has 
been shown to lead to a proliferation of predominantly 
stimulating influences (positive entries). Thus, since here 
we are interested in a model for macro-evolution and not 
ecology, we assume only positive binary entries in /, i.e. 
Iij G {0,1} for no interaction, or stimulating influence, 
respectively. 
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FIG. 1: At time t (left panel) node i is chosen to mutate. 
It has one incoming link from j and two outgoing links. At 
time t + 1 (right panel) node k came into existence through 
the mutation of i under the supportive influence of j. Here 
m = 1, i.e. k copies all outgoing links from i. 



Growth dynamics 

The model consists of a two-step process: a growth and 
diversification process, followed by a selection procedure. 
During one time step we apply the following procedure 
to each node in a random update: 

• Pick a node i. With probability 1 — A (same for 
all i) do nothing and pick another node, otherwise 
with probability A do the following: 

• Choose at random one of the nodes linking to i, say 
node j. Add a new node k to the network which is 
a mutation of either i or j. Set either ctijk = 1 or 
ctjik — 1 with equal probability. This means that 
either i or j has mutated. 

• Let us assume i mutated. Then the new species 
k receives an incoming link from i (Jjfc = 1) and 
copies each outgoing link from % with a probability 
m, i.e. if i links to i' (Iw — 1), k links to i' with 
probability m. (If it links we set Iki> = 1). 

• With probability p gen the new species k constitutes 
a new genus, otherwise k is associated with the 
same ancestor genus i. 

Effectively, we employ a 'copying mechanism', where a 
node i gets copied (produces node k) together with the 
two types of links involved: In the case of the ances- 
tral relationships either a link to i is established or, with 
same probability, one incoming link of i, namely from 
j, is copied. In the case of the species interactions each 
outgoing link from i is copied to k with probability m. 
See Fig. [T] for an illustration. A copying mechanism of 
this kind has been studied in |11| and was applied in the 
context of protein interaction networks [l3 |. 



For later use, the indegree of node i, k™ is defined as 
the sum of the i-th line of the interaction matrix /, i.e. 
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eJV(t)} 



I X i. Note that the number of species 
N(t), and thus matrices a and / are non-constant over 
time. 



Selection as k-core pruning 

By assuming that selection predominantly acts on 
species of low fitness, a quantitative measure for fitness is 
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FIG. 2: We compare the prediction from Eq. ((2} (solid line) 
with simulation data (red circles) for = 1. 



necessary. It was argued that a species' individual fitness 
should be related to the number of stable relationships 
that this species is able to maintain in its environment 
(l3| . The higher this number, the more interactions en- 
sure its survival. In this view one can directly identify the 
indegree of species i, K\ n , with its fitness; one can picture 
k™ as the total 'support' i gets from its surroundings. In 
this view it is natural to implement the selection proce- 
dure in the following way: 

Suppose there exists an exogenous stress level for all 
species, k stress which fluctuates due to abiotic causes. 
It can be modelled as a random process drawn at each 
time step from a Poisson distribution Pr (k stress — n ) = 
(9 n e~ e ) jn\. The mean 9 of this distribution gives the 
average biotic stress in the system. Species with < 
^stress become removed from the network with all their 
links. As soon as these nodes are removed some of the 
surviving nodes will now have an indegree smaller than 
^stress anQ i Decome extinct too, and so on. In other 
words, at each time step only the fc-core of the network 
survives, the network is pruned down to its fc-core. 



THEORETICAL ESTIMATES 



We now estimate the distributions of three quantities 
which are observable in fossil data, extinction events, life- 
time and species per genus. These are known to be com- 
patible with power-law distributions, with exponents be- 
tween 1.5 and 2 We analytically derive the exponents 
for extinction size je, number of species per genus 75, 
and lifetimes jl, and discuss parameter (in)dcpcndcncc 
of the results. We then compare them to simulations at 
the end of this section. 



Size distribution of extinctions 

We are interested in the number of species becoming 
extinct in each time step, i.e. the distribution of ex- 
tinction sizes. It can be derived analytically by mak- 
ing some simplifying assumptions. A node i's indegree 



is given by «f = J2i x 



{xeN(t)} ± xi- 



Since each node re- 



ceives an incoming link from its ancestor, the minimal 
indegree in the network is one. Thus each species can 
survive if we prune the network with k stress € {0, 1}. 
The probability p sur v for the occurrence of a stress level 
k stress 1 which does not lead to a single extinction event, is 
given by a Poisson process p sur v = J2n=o ($™ e ~ e ) = 
e~ 6 (1 + 8). With probability p surv the diversity prolif- 
erates as A (t + 1) = N (t) (1 + A). We assume that the 
main contribution to extinction sizes stem from percola- 
tion with k 8tress — 2, which is the case for reasonable 
choices of the parameters A and 9. By reasonable choices 
we mean values for A and 9 where a nontrivial interplay 
between the growth and extinction dynamics can de facto 
be observed. Otherwise, keeping A fixed and choosing 9 
too low the system would just grow exponentially, con- 
versely for too high 9 all species would vanish within a 
few iterations. We further make the simplifying assump- 
tion that if k stress > 1 occurs, a constant fraction c of the 
entire population will go extinct. Thus from our assump- 
tions follows the Ansatz that with probability 1 — p surv 
the diversity behaves like N (t + 1) = N (t) (1 - c). 

Let us call the number of species becoming extinct 
at each time step AN^ (t) and assume that AN^ (t) = 
cN (t). Then we have A Aft (t) /N = exp (Xt), or equiva- 
lent^ t = (l/A)ln(AAt (t) /N ). Assume that an ex- 
tinction event occurs at time t + 1. The probability 
that the system has proliferated over the past T it- 
erations is given by p T surv (T being an exponent), so 
the probability to find a specific extinction size AN* 
is given by Pr (AA* (t) = AN*) = pft$ MAJV* (*)/%)_ 
Taking the natural logarithm on both sides and plugging 
in for p surv we finally have In (Pr (A At (t) = AN*)) = 
const + [(-0 + In (1 + 6*)) /A] In AA*. Thus the distribu- 
tion of extinction sizes follows a power-law with exponent 
7e depending on A and 9: 



Pr (AA f (t) = AN* 



IE 



cx (AA*) 
6" — In (1 



-7b 



(1) 



A 



A comparison between this prediction and simulation re- 
sults from the full model (without assumptions) is shown 
in Fig. [5] for 9 = 1, revealing excellent agreement. Slopes 
from the simulation data were estimated using a max- 
imum likelihood method LJ], standard deviations are 



smaller than symbol size. This a posteriori justifies our 
simplifying assumption A N' (t) — cN (t) . The difference 
to simulation data stems from the fact that also percola- 
tions with higher k occur albeit exponentially less likely. 
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Distribution of species per genus 

Whereas the extinction-size distribution displays ex- 
plicit parameter on A and 9 dependence, this will be 
shown to be not the case for the distributions of species 
per genus and lifetimes. Let us start with the indegree 
distribution of our growth model p [Kf 1 ) encoded in /, 
which is known to be scale-free Growing networks 
have scale-free degree distribution if they incorporate 
preferential attachment. How is preferential attachment 
present in the present model? Consider the avenue of 
a new species k due to a mutation of i under the sup- 
portive influence of j and a randomly chosen, already 
existing node /. What is the probability that the inde- 
gree of I, k™, will increase by one? This can happen if I 
receives an incoming link from k because it already has 
an incoming link from i which happens with a probability 
proportional to the indegree of node I. This introduces 
preferential attachment and the resulting indegree distri- 
bution, as worked out in llj , follows a power law with 



pin 



(2) 



as long as the link-copying probability m > 0.4 (1) [llj , 
which we assume to hold. 

Suppose our system size is N species. Denote the num- 
ber of genera containing n s species by n g (n s ,N), It is 
then straight forward to derive the growth equation 

h g (n s ,N + 1) = n g (n s , N) - n g (n s , N)w(n s ) + (3) 
+n g (n s - 1, N)w(n s - 1) , 

where w(n s ) is the probability for each genus of size n s 
to increase its size by one. The dependence on p gen is 
introduced in the boundary conditions given by n g (n s — 
1,N + 1). For each node associated to an already existing 
genus, the number of p gen /(l — p gen ) nodes are added to 
this one per time step, so we get n g (l, N+l) = n g (l, N) — 
w(l)fi g {l,N) + p gen /(l - p gen ). We are interested in 
stationary solutions of Eq. [4] i.e. solutions which are 
independent of the actual system size N. For this let 
us define n g (n s ) = Nn g (n s ,N). The probability for a 
genus of size n s to increase its size by one is obviously 
w(n s ) = n s /N, this can be interpreted as the probability 
that a new node copies the genus information from a 
node of a genus of this respective size. Plugging all this 
into Eq. |4] we get the recursive relationship n g (n s ) — 
[(n s — l)/(n s + 1)] • n g {n s — 1) from which one can readily 
conclude n g (n s ) = f(p gen ) ■ (n s (n s + 1))" 1 where f(p gen ) 
is a constant, thus we have n g (n s ) oc n~ 2 to leading order. 

An important feature of fc-core percolation is that it 
preserves statistical invariants [l5| , that is, if the original 
network follows a scale-free degree distribution with a 
given exponent, its fc-core has the same distribution up 
to the cut-off at k. The scale-free network architecture 
imposed by our growth and diversification rules will not 



be altered by extinction events. Thus the species per 
genus distribution of the model is 

n g (n s ) oc n~ 2 , (4) 

i.e. 7s = 2, for the distribution of taxon sizes. 



Lifetime distribution 

To estimate for the distribution of lifetimes of long 
living species, i.e. species which will not become ex- 
tinct after the first few iterations, we ask for the life- 
time Ti of species i with an indegree Kf 1 drawn from 
p (Kf 1 ) . The probability that the stress level will be 
higher than the node's indegree is Pr (k stress > Kf 1 ) = 
Y^k=K irl +i ( e ~ 6 9 k ) /kl- The leading term in this sum is 
k = nf 1 + 1. Consider that T iterations of the dynamics 
have taken place. We are asking for long lived species, i.e. 
that within T>1 iterations there occurs no stress level 
higher than k™ . Generally, the probability that within 
T trials with success probability Pr (k stre - ss > zero 
successes are obtained is given by a binomial distribu- 
tion. For large sample sizes T the binomial distribution 
approaches a Poisson distribution, independent of T. Ac- 
cordingly, in our case the probability for zero successes 
(the occurrence of no stress level k stress > K\ n ) follows 
a Poisson distribution e -Pr ( fc >Ki ). From this it can 
be concluded that the probability to encounter a species 
i with lifetime r, i.e., Pr (tj = r), can be estimated from 
the node's indegree only. One can identify a necessary 
criterion for the survival of a node, namely that it has 
an indegree Kf 1 which is not exceeded by the stress level 
^stress j? or t ^> 1 iterations. So the probability to en- 
counter a lifetime r is given by the probability for the 
occurrence of a stress level higher than Kf 1 , 



Pr { n = r) oc p (Kf 1 ) e Pl '( fcStr " s >«r) 



(5) 



The probability to find a node with lifetime r is propor- 
tional to the probability of finding a node with a given 
indegree k™, truncated with the probability for the oc- 
currence of specific stress levels. There exists a regime 
where Pr (t^ = r) oc Pr (k" 1 = K m ) holds and by virtue 
of Eq. © we find j L = 2, i.e. 



Pr {ji — t) oc t 



Simulations 



(6) 



We compare simulation results of the presented model 
to fossil data for extinctions and lifetime drawn from 
[H, as well as species per genus after Q in Fig. [3] 
The model was implemented in a MatLab program and 
executed until a statistics of 2 ■ 10 4 extinction events 
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FIG. 3: Comparison of fossil data [l|, Q] (solid line with diamonds) with simulation data (red circles) for the three observables 
(a) extinction event size, (b) species per genus and (c) lifetimes. The straight line is the maximum likelihood estimate for the 
power-law exponent of the simulation data and indicates the range where the fit was applied. 



were accumulated. This corresponds to sample sizes of 
10 5 — 10 6 for individual lifetimes and numbers of species 
per genus, depending on the parameter settings. For 
Xp 

s wrv < c(l — Psurv) the size of the network does not 
diverge over time and the samples can be obtained from 
a single run of the simulation. For Xp SU rv > c (1 — p SU rv) 
the system tends to grow infinitely large; for practical 
purposes we aborted runs as soon as N (t) > 10 4 and 
iterated until a satisfactory statistics was reached. 

For the extinction events the resulting slope 
is parameter dependent, we used the setting 
(A = 0.15, m = 1, 6 = 1) to obtain agreement with 
the slope of je = 2.0(2) from the fossil data [lij ]. 
For the number of species per genus and lifetimes the 
distributions are independent of the parameter settings 
and given by the topology (which is a scale-free indegree 
distribution for values of m > 0.4(1)) of our network 
only, yielding 75 = 2,7l = 2. For all three subplots the 
simulation data was fitted with a maximum likelihood 
estimation the range of the fit is indicated by the 
range of the straight line. Subsequently the numerical 
results were binned logarithmically and, if necessary, 
shifted multiplicatively to enhance the clarity of the 
plots. Our exponents are summarized and compared to 
several previous models in Table HI 



TABLE I: Exponents of the distributions of extinction sizes 
7b, species per genus 75, and lifetimes 7_l, as obtained from 
the fossil record and compared to the exponents of various well 
known evolution models. The value for 7_b from this model 
was obtained from simulations with A = 0.15, m = 1, 9 = 1. 





7b 7s 1l 


fossil data 

Kauffman 

Bak and Sneppen 

Sole and Manrubia 

Newman 

present model 


2.0(2) 1.7(3) 1.5(1) 


~1 

1 to 3/2 1 

2.05(6) - 2.05(6) 

2.02(2) 1.03(5) 1.6(1) 

2.049(8) 2 2 



DISCUSSION 

We presented a model for evolution which reproduces 
statistical features observed in fossil data. An evolu- 
tionary system is modelled as a catalytic network with 
two superimposed network topologies, one incorporat- 
ing species-species interactions, the other the phyloge- 
netic tree structure. The fitness of species is given by 
the connectivity structure of the network, thus naturally 
a co-evolving fitness landscape arises. Fitness becomes 
nothing but a co-evolving topological entity, the more 
relationships a species is able to build and sustain, the 
fitter it becomes. Species interactions are introduced by 
a variant of preferential attachment known as 'copying 



mechanism' ll(. Without any further assumptions this 
mechanism leads to a natural emergence of "ecological 
niches" , which in network terms relate to a high degree 
of clustering in the network. 

In this model we have taken a gradualist viewpoint 
concerning speciation in assuming that the growth rate 
A is constant. However, this choice was only made for 
reasons of simplicity. Benton and Pearson 17| propose 
that gradual speciations are more likely to occur in stable 
environments (as it is the case for e.g. marine plankton), 
whereas marine invertebrates and vertebrates are more 
likely to show a punctuated pattern of speciation. The 
latter case could be naturally introduced in our model by 
assuming a functional dependence A = A(f9), i.e. intro- 
ducing a mechanism that couples the growth rate with 
the actual values of k stress . Irrespective of this choice, the 
main characteristics of our model would not be altered. 
The number of species per genus and lifetimes only de- 
pends on topological features of the network which would 
not be affected by a varying growth rate. Our analysis 
for the extinction sizes would hold too, except that one 
has to set A = A (9) in Eq. ([2]) . The existence of the 
power law is independent of both the functional form of 
the growth rate and the stochastic stress level. 

Our selection mechanism differs from the one studied 
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by Sole and Manrubia [6| in that extinction avalanches 
spread over successive time steps in their model and that 
each species becoming extinct is immediately replaced 
by a randomly chosen one (therefore leading rather to 
a model for ecology where empty niches are re-fillcd), 
whereas in our model the selection mechanism acts on 
a 'snapshot' of the population and does not depend on 
which randomly chosen species replaces an extinct one. 
Our pruning procedure further differs from the selection 
mechanism adopted by Newman [3| in that each species 
has a randomly assigned fitness value (independent of in- 
terspecies relationships) and species below a given stress 
level become extinct, which is contrasted by mass extinc- 
tions of causally connected species in our model. 

We suggested the use of fc-core percolation as a mech- 
anism to select species according to their fitness values. 
On a technical level this allows to understand the system 
by studying its fc-core architecture. If the applicability of 
this mechanism to prune the 'tree of life' can be justified 
beyond the statistical features presented here, remains 
an open question. 
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